Picture for Le Zhang

Le Zhang

How and What to Imagine? Visual Thinking in Unified Multimodal Models for Cross-View Spatial Reasoning

Add code
May 26, 2026
Viaarxiv icon

DynFrame: Adaptive Reasoning-Driven Multimodal Framework with Dynamic Frame Augmentation for Complex Video Understanding

Add code
May 26, 2026
Viaarxiv icon

How Far Has AI Come in Liver Fibrosis Staging? A Large-Scale Real-World Dataset and Benchmark

Add code
May 25, 2026
Viaarxiv icon

RiT: Vanilla Diffusion Transformers Suffice in Representation Space

Add code
May 21, 2026
Viaarxiv icon

From Where Things Are to What They Are For: Benchmarking Spatial-Functional Intelligence in Multimodal LLMs

Add code
May 04, 2026
Viaarxiv icon

MedFlowSeg: Flow Matching for Medical Image Segmentation with Frequency-Aware Attention

Add code
Apr 21, 2026
Viaarxiv icon

Make It Up: Fake Images, Real Gains in Generalized Few-shot Semantic Segmentation

Add code
Mar 28, 2026
Viaarxiv icon

End-to-End Dexterous Grasp Learning from Single-View Point Clouds via a Multi-Object Scene Dataset

Add code
Mar 16, 2026
Viaarxiv icon

Spectrum Matching: a Unified Perspective for Superior Diffusability in Latent Diffusion

Add code
Mar 15, 2026
Viaarxiv icon

Legal-DC: Benchmarking Retrieval-Augmented Generation for Legal Documents

Add code
Mar 12, 2026
Viaarxiv icon